464 research outputs found
Multi-Source Spatial Entity Linkage
Besides the traditional cartographic data sources, spatial information can
also be derived from location-based sources. However, even though different
location-based sources refer to the same physical world, each one has only
partial coverage of the spatial entities, describe them with different
attributes, and sometimes provide contradicting information. Hence, we
introduce the spatial entity linkage problem, which finds which pairs of
spatial entities belong to the same physical spatial entity. Our proposed
solution (QuadSky) starts with a time-efficient spatial blocking technique
(QuadFlex), compares pairwise the spatial entities in the same block, ranks the
pairs using Pareto optimality with the SkyRank algorithm, and finally,
classifies the pairs with our novel SkyEx-* family of algorithms that yield
0.85 precision and 0.85 recall for a manually labeled dataset of 1,500 pairs
and 0.87 precision and 0.6 recall for a semi-manually labeled dataset of
777,452 pairs. Moreover, we provide a theoretical guarantee and formalize the
SkyEx-FES algorithm that explores only 27% of the skylines without any loss in
F-measure. Furthermore, our fully unsupervised algorithm SkyEx-D approximates
the optimal result with an F-measure loss of just 0.01. Finally, QuadSky
provides the best trade-off between precision and recall, and the best
F-measure compared to the existing baselines and clustering techniques, and
approximates the results of supervised learning solutions
Scalable Model-Based Management of Correlated Dimensional Time Series in ModelarDB+
To monitor critical infrastructure, high quality sensors sampled at a high
frequency are increasingly used. However, as they produce huge amounts of data,
only simple aggregates are stored. This removes outliers and fluctuations that
could indicate problems. As a remedy, we present a model-based approach for
managing time series with dimensions that exploits correlation in and among
time series. Specifically, we propose compressing groups of correlated time
series using an extensible set of model types within a user-defined error bound
(possibly zero). We name this new category of model-based compression methods
for time series Multi-Model Group Compression (MMGC). We present the first MMGC
method GOLEMM and extend model types to compress time series groups. We propose
primitives for users to effectively define groups for differently sized data
sets, and based on these, an automated grouping method using only the time
series dimensions. We propose algorithms for executing simple and
multi-dimensional aggregate queries on models. Last, we implement our methods
in the Time Series Management System (TSMS) ModelarDB (ModelarDB+). Our
evaluation shows that compared to widely used formats, ModelarDB+ provides up
to 13.7 times faster ingestion due to high compression, 113 times better
compression due to the adaptivity of GOLEMM, 630 times faster aggregates by
using models, and close to linear scalability. It is also extensible and
supports online query processing.Comment: 12 Pages, 28 Figures, and 1 Tabl
- …